A Nonparametric Statistical Approach to Clustering via Mode Identification
نویسندگان
چکیده
In this paper, we develop a mode-based clustering approach applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This clustering method shares the major advantages of mixture model based clustering. Moreover, it requires no model fitting and ensures that every cluster corresponds to a bump of the density. A hierarchical clustering algorithm is also developed by applying MEM recursively to kernel density estimators with increasing bandwidths. The issue of diagnosing clustering results is investigated. Specifically, a pairwise cluster separability measure is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is developed to guarantee strong separation between clusters. Experiments demonstrate that our clustering approach tends to combine the strengths of mixture-model-based and linkage-based clustering. Tests on both simulated and real data show that the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling.
منابع مشابه
Identification of Nonlinear Modal Interactions in a Beam-Mass-Spring-Damper System based on Mono-Frequency Vibration Response
In this paper, nonlinear modal interactions caused by one-to-three internal resonance in a beam-mass-spring-damper system are investigated based on nonlinear system identification. For this purpose, the equations governing the transverse vibrations of the beam and mass are analyzed via the multiple scale method and the vibration response of the system under primary resonance is extracted. Then,...
متن کاملClustering via nonparametric density estimation: an application to microarray data
Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps:...
متن کاملA nonparametric algorithm for clustering microarray data
Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps:...
متن کاملClustering High-Dimensional Data Using Evidence of Multimodality
We suggest a nonparametric approach to clustering very high-dimensional data, designed particularly for problems where the mixture nature of a population is expressed through multimodality of its density. In such cases a technique based implicitly on mode-testing can be particularly effective. In principle, several alternative approaches could be used to assess the extent of multimodality, but ...
متن کاملFeature Selection For High-Dimensional Clustering
We present a nonparametric method for selecting informative features in high-dimensional clustering problems. We start with a screening step that uses a test for multimodality. Then we apply kernel density estimation and mode clustering to the selected features. The output of the method consists of a list of relevant features, and cluster assignments. We provide explicit bounds on the error rat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007